Library Imports

from pyspark.sql import SparkSession
from pyspark.sql import types as T

from pyspark.sql import functions as F

from datetime import datetime
from decimal import Decimal


spark = (
    .appName("Section 2.8 - Case Statements")
    .config("spark.some.config.option", "some-value")

sc = spark.sparkContext

import os

data_path = "/data/pets.csv"
base_path = os.path.dirname(os.getcwd())
path = base_path + data_path
pets =, header=True)
id breed_id nickname birthday age color
0 1 1 King 2014-11-22 12:30:31 5 brown
1 2 3 Argus 2016-11-22 10:05:10 10 None
2 3 1 Chewie 2016-11-22 10:05:10 15 None
3 3 2 Maple 2018-11-22 10:05:10 17 white
4 4 2 None 2019-01-01 10:05:10 13 None

Case Statements

Case statements are usually used for performing stateful calculations.


  • if x then a
  • if y then b
  • everything else c

Using Switch/Case Statements in Spark

        F.when(F.col('age') <= 5, 'young')
         .when((F.col('age') > 5) & (F.col('age') <= 10), 'middle age')
id breed_id nickname birthday age color oldness_value
0 1 1 King 2014-11-22 12:30:31 5 brown young
1 2 3 Argus 2016-11-22 10:05:10 10 None middle age
2 3 1 Chewie 2016-11-22 10:05:10 15 None old
3 3 2 Maple 2018-11-22 10:05:10 17 white old
4 4 2 None 2019-01-01 10:05:10 13 None old

What Happened?

Based on the age of the pet, we classified if they are either young, middle age or old. Please don't take offense, this is merely an example.

We mapped the logic of:

  • If their age is younger than or equal to 5, then they are considered young.
  • If their age is greater than 5 but younger than or equal to 10 , then they are considered middle age.
  • Anyone older is considered old.


  • We learned how to map values based on case statements and a deafult value if all conditions are not satified.

results matching ""

    No results matching ""